As a business analyst in Melbourne, I want to analyze how factors like pedestrian traffic patterns, business density, and cafe/restaurant activity influence retail store performance, so that I can identify optimal store locations, maximize customer engagement, and support data-driven decision-making for retail businesses.
At the end of this use case, you will:
- Data Wrangling and Preprocessing: Gain expertise in handling multiple datasets, including pedestrian traffic data, business density data, and café/restaurant activity data, cleaning and preparing them for analysis.
- Exploratory Data Analysis (EDA): Learn to visualize and analyze spatial and temporal trends in pedestrian traffic, business distributions, and associated activities to uncover patterns and insights.
- Geospatial Analysis: Develop skills in analyzing location-based data, including mapping pedestrian hotspots and correlating them with business densities.
- Data Integration: Master the integration of diverse datasets, such as combining pedestrian traffic counts with business activity and nearby amenities data, for comprehensive insights.
- Predictive Modeling: Build and evaluate models to forecast high-potential areas for retail store success based on pedestrian patterns, business activity, and complementary services.
- Visualization and Reporting: Create interactive maps and dashboards to effectively communicate insights and recommendations to business stakeholders.
- Domain Knowledge in Business & Activity: Understand the relationship between urban activity, pedestrian dynamics, and retail performance, enabling data-driven decision-making for strategic retail planning.
Urban retail landscapes are becoming increasingly competitive as cities grow, requiring businesses to make data-driven decisions to thrive. Understanding the factors that influence retail success, such as pedestrian movement patterns and business density, is crucial for identifying high-potential locations. This use case focuses on analyzing these factors to help businesses strategically position their stores for maximum visibility and engagement.
The analysis leverages two primary datasets: the Pedestrian Counting System (counts per hour) and Business Establishments Location and Industry Classification data. These datasets, sourced from Melbourne's open data portal, provide insights into pedestrian traffic dynamics and existing business density. By integrating this data, the use case aims to uncover actionable insights into high-traffic retail zones, helping businesses align their strategies with urban dynamics and enhance their competitive advantage.
This analysis supports Melbourne’s economic vitality by promoting smarter urban planning and enabling businesses to align with the city’s activity patterns for long-term success.
This dataset records hourly pedestrian counts across various locations in Melbourne, providing valuable insights into foot traffic patterns. It helps identify areas with high pedestrian activity and peak times, critical for analyzing potential retail store locations. The dataset is sourced from the Melbourne Open Data website and can be accessed via API V2.1.
This dataset details the locations and industry classifications of businesses in Melbourne, offering insights into existing retail density and types of businesses operating in specific areas. It aids in understanding competitive landscapes and complementary activities around high-potential zones. This dataset is also sourced from the Melbourne Open Data website using API V2.1.
Required Libraries and Packages¶
This section imports essential libraries for data manipulation, visualization, geospatial analysis, interactive mapping, and fetching data from APIs. These libraries provide the necessary functionality for processing, analyzing, and visualizing the project data effectively.
# Basic Libraries
import pandas as pd
import numpy as np
# Data Visualization
import matplotlib.pyplot as plt
import seaborn as sns
# Geospatial Analysis
import geopandas as gpd
from shapely.geometry import Point
# Interactive Maps
import folium
# Data Fetching and Processing
import requests
from io import StringIO
from io import BytesIO
Loading the datasets using API 2.1v¶
This section defines functions for fetching data from APIs. The API_Unlimited function retrieves datasets from the Melbourne Open Data Portal using dataset IDs, processes the data into a DataFrame, and provides a preview for verification. Similarly, the fetch_data_from_url function fetches data directly from a given URL, processes it into a DataFrame, and displays a sample for validation. These functions enable seamless access to external datasets for analysis.
#Function to collect data
def API_Unlimited(datasetname): # pass in dataset name and api key
dataset_id = datasetname
base_url = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
#apikey = api_key
dataset_id = dataset_id
format = 'csv'
url = f'{base_url}{dataset_id}/exports/{format}'
params = {
'select': '*',
'limit': -1, # all records
'lang': 'en',
'timezone': 'UTC'
}
# GET request
response = requests.get(url, params=params)
if response.status_code == 200:
# StringIO to read the CSV data
url_content = response.content.decode('utf-8')
datasetname = pd.read_csv(StringIO(url_content), delimiter=';')
print(datasetname.sample(10, random_state=999)) # Test
return datasetname
else:
return (print(f'Request failed with status code {response.status_code}'))
# Function to fetch data from a URL
def fetch_data_from_url(url):
response = requests.get(url)
if response.status_code == 200:
data = pd.read_csv(StringIO(response.content.decode('utf-8')))
print(data.head()) # Display the first few rows for verification
return data
else:
print(f"Failed to fetch data: {response.status_code}")
return None
Fetching and Previewing Datasets¶
This section defines the dataset IDs required for the use case and fetches the corresponding data using the API_Unlimited function. The datasets include pedestrian traffic counts and business establishment details, which are essential for analyzing traffic patterns and business density. After retrieval, the code displays the first few rows of each dataset to confirm successful loading and ensure data integrity.
# Define dataset IDs for my use case
pedestrian_traffic_dataset_id = 'pedestrian-counting-system-monthly-counts-per-hour'
business_establishments_dataset_id = 'business-establishments-with-address-and-industry-classification'
# Fetch datasets
pedestrian_traffic_data = API_Unlimited(pedestrian_traffic_dataset_id)
business_establishments_data = API_Unlimited(business_establishments_dataset_id)
# Display the fetched datasets
print("\nPedestrian Traffic Dataset:")
print(pedestrian_traffic_data.head())
print("\nBusiness Establishments Dataset:")
print(business_establishments_data.head())
id location_id sensing_date hourday direction_1 \
830362 28820241006 28 2024-10-06 8 140
472623 1401820240303 140 2024-03-03 18 81
378922 311620240413 31 2024-04-13 16 198
1244588 48420220404 48 2022-04-04 4 3
567514 611020240422 61 2024-04-22 10 330
1086213 232220220108 23 2022-01-08 22 33
1599708 502320240125 50 2024-01-25 23 58
1209273 85620240105 85 2024-01-05 6 12
1141266 18120230220 18 2023-02-20 1 1
1355709 431020210818 43 2021-08-18 10 9
direction_2 pedestriancount sensor_name location
830362 72 212 VAC_T -37.82129925, 144.96879309
472623 106 187 Boyd2837_T -37.82590962, 144.96185972
378922 206 404 Lyg161_T -37.80169681, 144.96658911
1244588 4 7 QVMQ_T -37.80631581, 144.95866697
567514 469 799 RMIT14_T -37.80767455, 144.96309114
1086213 48 81 Col623_T -37.81909256, 144.95452748
1599708 21 79 Lyg309_T -37.79808192, 144.96721013
1209273 7 19 488Mac_T -37.79432415, 144.92973378
1141266 4 5 Col12_T -37.81344862, 144.97305353
1355709 12 21 UM2_T -37.79844526, 144.96411782
census_year block_id property_id base_property_id clue_small_area \
356160 2020 752 110734 110733 Southbank
204116 2004 861 102500 102500 South Yarra
41994 2005 27 102067 102067 Melbourne (CBD)
11828 2017 1109 573327 573327 Docklands
135276 2008 35 102143 102143 Melbourne (CBD)
129888 2009 12 110091 110091 Melbourne (CBD)
133276 2009 361 573297 573297 North Melbourne
344163 2018 310 628739 628739 North Melbourne
110883 2022 1101 110843 110843 Docklands
27432 2006 247 106235 106235 Carlton
trading_name \
356160 Sake Restaurant & Bar
204116 Steen & Tan Proprietary Limited
41994 Anthony Squires Fine Quality Clothes
11828 New Quay Asian Grocery
135276 Rivette And Blair
129888 Flight Centre
133276 Easli Pty Ltd
344163 Swim Communications
110883 Coffee Rush
27432 Poppy's Thai Restaurant
business_address \
356160 Part 100 St Kilda Road SOUTHBANK VIC 3006
204116 153 Domain Road SOUTH YARRA 3141
41994 Shop 8, 101 Collins Street MELBOURNE 3000
11828 15-17 Caravel Lane DOCKLANDS 3008
135276 Shop 210, Ground , 260 Collins Street MELBOURN...
129888 Shop 9, 15 William Street MELBOURNE 3000
133276 Unit 30, Ground , 1 O'Connell Street NORTH MEL...
344163 Ground , 134 Langford Street NORTH MELBOURNE 3051
110883 Shop 310, Ground 229 Spencer Street DOCKLANDS ...
27432 230 Lygon Street CARLTON 3053
industry_anzsic4_code \
356160 4511
204116 6921
41994 4254
11828 4110
135276 4255
129888 7220
133276 7311
344163 7000
110883 4511
27432 4511
industry_anzsic4_description longitude \
356160 Cafes and Restaurants 144.968170
204116 Architectural Services 144.981278
41994 Mens Clothing Retailing 144.970782
11828 Supermarket and Grocery Stores 144.940675
135276 Womens Clothing Retailing 144.964909
129888 Travel Agency and Tour Arrangement Services 144.959198
133276 Building and Other Industrial Cleaning Services 144.957973
344163 Computer System Design and Related Services 144.936880
110883 Cafes and Restaurants 144.950564
27432 Cafes and Restaurants 144.967220
latitude point
356160 -37.820942 -37.82094204031448, 144.96817034075002
204116 -37.834261 -37.8342614498216, 144.981277576097
41994 -37.814894 -37.8148943012917, 144.9707823311265
11828 -37.814922 -37.8149216286, 144.94067463890494
135276 -37.815414 -37.815413696488875, 144.96490915325
129888 -37.819294 -37.81929390638578, 144.9591980659
133276 -37.805723 -37.8057225963, 144.9579731919805
344163 -37.796126 -37.7961257162923, 144.93687974609998
110883 -37.814509 -37.814508973578526, 144.9505641426
27432 -37.801805 -37.80180542035775, 144.96721958448984
Pedestrian Traffic Dataset:
id location_id sensing_date hourday direction_1 direction_2 \
0 522020231213 52 2023-12-13 20 368 176
1 231820230711 23 2023-07-11 18 158 310
2 72220230817 72 2023-08-17 2 2 4
3 581120241106 58 2024-11-06 11 857 282
4 23620220731 23 2022-07-31 6 8 27
pedestriancount sensor_name location
0 544 Eli263_T -37.81252157, 144.9619401
1 468 Col623_T -37.81909256, 144.95452748
2 6 ACMI_T -37.81726338, 144.96872809
3 1139 Bou688_T -37.81686075, 144.95358075
4 35 Col623_T -37.81909256, 144.95452748
Business Establishments Dataset:
census_year block_id property_id base_property_id clue_small_area \
0 2021 27 103596 103596 Melbourne (CBD)
1 2021 27 103968 103968 Melbourne (CBD)
2 2021 27 103968 103968 Melbourne (CBD)
3 2021 27 103968 103968 Melbourne (CBD)
4 2021 27 103968 103968 Melbourne (CBD)
trading_name \
0 TMF Corporate Services (Aust) Pty Limited
1 Arena Reit Limited
2 Taxbanter Pty Ltd
3 Pask Group
4 Webb Martin Consulting Pty Ltd
business_address industry_anzsic4_code \
0 Part Level 9 63 Exhibition Street MELBOURNE VI... 6932
1 Suite 5, Level 5 41 Exhibition Street MELBOURN... 6720
2 Part Suite 9, Level 9 41 Exhibition Street MEL... 6932
3 Part Suite 15, Level 15 41 Exhibition Street M... 3011
4 Part Suite 9, Level 9 41 Exhibition Street MEL... 6932
industry_anzsic4_description longitude latitude \
0 Accounting Services 144.971304 -37.814602
1 Real Estate Services 144.971575 -37.815017
2 Accounting Services 144.971575 -37.815017
3 House Construction 144.971575 -37.815017
4 Accounting Services 144.971575 -37.815017
point
0 -37.814602312, 144.9713042703283
1 -37.81501688045, 144.9715754974218
2 -37.81501688045, 144.9715754974218
3 -37.81501688045, 144.9715754974218
4 -37.81501688045, 144.9715754974218
Displaying Dataset Overview¶
This part of the code verifies the datasets by displaying their dimensions and a preview of the first few rows. It ensures that the pedestrian traffic and business establishments data have been successfully loaded and are ready for analysis.
# Retrieve and display the "Pedestrian Traffic" dataset
print(f'The shape of the Pedestrian Traffic dataset is {pedestrian_traffic_data.shape}.')
print('Below are the first few rows of this dataset:')
print("Pedestrian Traffic Dataset:")
print(pedestrian_traffic_data.head())
# --------------------
# Retrieve and display the "Business Establishments" dataset
print(f'The shape of the Business Establishments dataset is {business_establishments_data.shape}.')
print('Below are the first few rows of this dataset:')
print("Business Establishments Dataset:")
print(business_establishments_data.head())
The shape of the Pedestrian Traffic dataset is (2025362, 9).
Below are the first few rows of this dataset:
Pedestrian Traffic Dataset:
id location_id sensing_date hourday direction_1 direction_2 \
0 522020231213 52 2023-12-13 20 368 176
1 231820230711 23 2023-07-11 18 158 310
2 72220230817 72 2023-08-17 2 2 4
3 581120241106 58 2024-11-06 11 857 282
4 23620220731 23 2022-07-31 6 8 27
pedestriancount sensor_name location
0 544 Eli263_T -37.81252157, 144.9619401
1 468 Col623_T -37.81909256, 144.95452748
2 6 ACMI_T -37.81726338, 144.96872809
3 1139 Bou688_T -37.81686075, 144.95358075
4 35 Col623_T -37.81909256, 144.95452748
The shape of the Business Establishments dataset is (374210, 12).
Below are the first few rows of this dataset:
Business Establishments Dataset:
census_year block_id property_id base_property_id clue_small_area \
0 2021 27 103596 103596 Melbourne (CBD)
1 2021 27 103968 103968 Melbourne (CBD)
2 2021 27 103968 103968 Melbourne (CBD)
3 2021 27 103968 103968 Melbourne (CBD)
4 2021 27 103968 103968 Melbourne (CBD)
trading_name \
0 TMF Corporate Services (Aust) Pty Limited
1 Arena Reit Limited
2 Taxbanter Pty Ltd
3 Pask Group
4 Webb Martin Consulting Pty Ltd
business_address industry_anzsic4_code \
0 Part Level 9 63 Exhibition Street MELBOURNE VI... 6932
1 Suite 5, Level 5 41 Exhibition Street MELBOURN... 6720
2 Part Suite 9, Level 9 41 Exhibition Street MEL... 6932
3 Part Suite 15, Level 15 41 Exhibition Street M... 3011
4 Part Suite 9, Level 9 41 Exhibition Street MEL... 6932
industry_anzsic4_description longitude latitude \
0 Accounting Services 144.971304 -37.814602
1 Real Estate Services 144.971575 -37.815017
2 Accounting Services 144.971575 -37.815017
3 House Construction 144.971575 -37.815017
4 Accounting Services 144.971575 -37.815017
point
0 -37.814602312, 144.9713042703283
1 -37.81501688045, 144.9715754974218
2 -37.81501688045, 144.9715754974218
3 -37.81501688045, 144.9715754974218
4 -37.81501688045, 144.9715754974218
This section performs a data quality check by identifying missing values and duplicate rows in the pedestrian traffic and business establishments datasets. This helps ensure the data is clean and ready for further analysis by highlighting potential issues that need to be addressed.
# Check for missing values in pedestrian traffic dataset
print("Missing values in Pedestrian Traffic Dataset:")
print(pedestrian_traffic_data.isnull().sum())
# Check for missing values in business establishments dataset
print("\nMissing values in Business Establishments Dataset:")
print(business_establishments_data.isnull().sum())
# Check for duplicate rows in pedestrian traffic dataset
print(f"\nDuplicate rows in Pedestrian Traffic Dataset: {pedestrian_traffic_data.duplicated().sum()}")
# Check for duplicate rows in business establishments dataset
print(f"Duplicate rows in Business Establishments Dataset: {business_establishments_data.duplicated().sum()}")
Missing values in Pedestrian Traffic Dataset: id 0 location_id 0 sensing_date 0 hourday 0 direction_1 0 direction_2 0 pedestriancount 0 sensor_name 0 location 0 dtype: int64 Missing values in Business Establishments Dataset: census_year 0 block_id 0 property_id 0 base_property_id 0 clue_small_area 0 trading_name 127 business_address 1 industry_anzsic4_code 0 industry_anzsic4_description 0 longitude 4785 latitude 4785 point 4785 dtype: int64 Duplicate rows in Pedestrian Traffic Dataset: 0 Duplicate rows in Business Establishments Dataset: 0
- Handle Missing and Duplicate Data¶
This section addresses missing values in the business establishments dataset. Categorical columns like trading_name and business_address are filled with their most frequent values (mode), while numerical columns such as longitude and latitude are filled with their mean values. Additionally, the redundant point column is removed to streamline the dataset for analysis.
# Handle missing values for 'trading_name' (categorical column) using the mode
business_establishments_data['trading_name'] = business_establishments_data['trading_name'].fillna(
business_establishments_data['trading_name'].mode()[0]
)
# Handle missing values for 'business_address' (categorical column) using the mode
business_establishments_data['business_address'] = business_establishments_data['business_address'].fillna(
business_establishments_data['business_address'].mode()[0]
)
# Handle missing values for 'longitude' and 'latitude' (numerical columns) using the mean
business_establishments_data['longitude'] = business_establishments_data['longitude'].fillna(
business_establishments_data['longitude'].mean()
)
business_establishments_data['latitude'] = business_establishments_data['latitude'].fillna(
business_establishments_data['latitude'].mean()
)
# Drop the 'point' column since it's redundant
business_establishments_data = business_establishments_data.drop(columns=['point'])
- Convert Data Types¶
This part ensures the data is in the correct format for analysis. The sensing_date column is converted to datetime format, and numerical columns like pedestriancount, longitude, and latitude are converted to numeric types. Data types are then verified to confirm the transformations were successful.
# Convert 'date' column to datetime
pedestrian_traffic_data['sensing_date'] = pd.to_datetime(pedestrian_traffic_data['sensing_date'], errors='coerce')
# Ensure numerical columns are in the correct format
pedestrian_traffic_data['pedestriancount'] = pd.to_numeric(pedestrian_traffic_data['pedestriancount'], errors='coerce')
business_establishments_data['longitude'] = pd.to_numeric(business_establishments_data['longitude'], errors='coerce')
business_establishments_data['latitude'] = pd.to_numeric(business_establishments_data['latitude'], errors='coerce')
# Verify data types
print("\nPedestrian Traffic Dataset Data Types:")
print(pedestrian_traffic_data.dtypes)
print("\nBusiness Establishments Dataset Data Types:")
print(business_establishments_data.dtypes)
Pedestrian Traffic Dataset Data Types: id int64 location_id int64 sensing_date datetime64[ns] hourday int64 direction_1 int64 direction_2 int64 pedestriancount int64 sensor_name object location object dtype: object Business Establishments Dataset Data Types: census_year int64 block_id int64 property_id int64 base_property_id int64 clue_small_area object trading_name object business_address object industry_anzsic4_code int64 industry_anzsic4_description object longitude float64 latitude float64 dtype: object
- Verify Data Cleaning¶
This part of the code verifies the success of the data cleaning process by displaying the first few rows of the pedestrian traffic and business establishments datasets. This ensures that missing values, data type corrections, and other preprocessing steps were applied correctly, preparing the data for further analysis.
# Verify the cleaned datasets
print("Cleaned Pedestrian Traffic Dataset:")
print(pedestrian_traffic_data.head())
print("\nCleaned Business Establishments Dataset:")
print(business_establishments_data.head())
Cleaned Pedestrian Traffic Dataset:
id location_id sensing_date hourday direction_1 direction_2 \
0 522020231213 52 2023-12-13 20 368 176
1 231820230711 23 2023-07-11 18 158 310
2 72220230817 72 2023-08-17 2 2 4
3 581120241106 58 2024-11-06 11 857 282
4 23620220731 23 2022-07-31 6 8 27
pedestriancount sensor_name location
0 544 Eli263_T -37.81252157, 144.9619401
1 468 Col623_T -37.81909256, 144.95452748
2 6 ACMI_T -37.81726338, 144.96872809
3 1139 Bou688_T -37.81686075, 144.95358075
4 35 Col623_T -37.81909256, 144.95452748
Cleaned Business Establishments Dataset:
census_year block_id property_id base_property_id clue_small_area \
0 2021 27 103596 103596 Melbourne (CBD)
1 2021 27 103968 103968 Melbourne (CBD)
2 2021 27 103968 103968 Melbourne (CBD)
3 2021 27 103968 103968 Melbourne (CBD)
4 2021 27 103968 103968 Melbourne (CBD)
trading_name \
0 TMF Corporate Services (Aust) Pty Limited
1 Arena Reit Limited
2 Taxbanter Pty Ltd
3 Pask Group
4 Webb Martin Consulting Pty Ltd
business_address industry_anzsic4_code \
0 Part Level 9 63 Exhibition Street MELBOURNE VI... 6932
1 Suite 5, Level 5 41 Exhibition Street MELBOURN... 6720
2 Part Suite 9, Level 9 41 Exhibition Street MEL... 6932
3 Part Suite 15, Level 15 41 Exhibition Street M... 3011
4 Part Suite 9, Level 9 41 Exhibition Street MEL... 6932
industry_anzsic4_description longitude latitude
0 Accounting Services 144.971304 -37.814602
1 Real Estate Services 144.971575 -37.815017
2 Accounting Services 144.971575 -37.815017
3 House Construction 144.971575 -37.815017
4 Accounting Services 144.971575 -37.815017
- Sensor Locations Dataset Fetching¶
This section retrieves the pedestrian sensor locations dataset, which provides geospatial details of sensor placements, using the API_Unlimited function. A sample of the data is displayed to verify successful fetching and ensure it is ready for integration with other datasets.
# Fetch the Pedestrian Sensor Locations dataset
sensor_locations_dataset_id = 'pedestrian-counting-system-sensor-locations'
sensor_locations = API_Unlimited(sensor_locations_dataset_id)
# Verify the fetched data
print("\nSensor Locations Dataset Sample:")
print(sensor_locations.head())
location_id sensor_description \
61 65 Swanston St - City Square
93 17 Collins Place (South)
29 87 Errol St (West)
126 137 COM Pole 2353 - Towards the city, NAB Building
0 2 Bourke Street Mall (South)
24 72 Flinders St- ACMI
18 43 Monash Rd-Swanston St (West)
125 131 I-Hub Corner of King Street and Flinders Stree...
53 41 Flinders La-Swanston St (West)
140 166 484 Spencer Street
sensor_name installation_date \
61 SwaCs_T 2020-03-12
93 Col15_T 2009-03-30
29 Errol23_T 2022-05-20
126 BouHbr2353_T 2023-11-03
0 Bou283_T 2009-03-30
24 ACMI_T 2020-11-30
18 UM2_T 2015-04-15
125 King2_T 2023-09-25
53 Swa31 2017-06-29
140 Spen484_T 2024-09-25
note location_type status \
61 NaN Outdoor A
93 Device is upgraded in 26/02/2020 Outdoor A
29 NaN Outdoor A
126 NaN Outdoor A
0 NaN Outdoor A
24 NaN Outdoor A
18 NaN Outdoor A
125 NaN Outdoor A
53 NaN Outdoor A
140 Former sensor 227 Bourke Street – City Lab Outdoor A
direction_1 direction_2 latitude longitude location
61 North South -37.815694 144.966806 -37.81569416, 144.9668064
93 East West -37.813625 144.973236 -37.81362543, 144.97323591
29 North South -37.804549 144.949219 -37.80454949, 144.94921863
126 East West -37.818948 144.946123 -37.81894815, 144.94612292
0 East West -37.813807 144.965167 -37.81380668, 144.96516718
24 East West -37.817263 144.968728 -37.81726338, 144.96872809
18 North South -37.798445 144.964118 -37.79844526, 144.96411782
125 North South -37.820091 144.957587 -37.82009057, 144.95758725
53 North South -37.816686 144.966897 -37.81668634, 144.96689733
140 North South -37.808967 144.949317 -37.80896733, 144.94931703
Sensor Locations Dataset Sample:
location_id sensor_description sensor_name \
0 2 Bourke Street Mall (South) Bou283_T
1 4 Town Hall (West) Swa123_T
2 6 Flinders Street Station Underpass FliS_T
3 8 Webb Bridge WebBN_T
4 10 Victoria Point BouHbr_T
installation_date note location_type status direction_1 \
0 2009-03-30 NaN Outdoor A East
1 2009-03-23 NaN Outdoor A North
2 2009-03-25 Upgraded on 8/09/21 Outdoor A North
3 2009-03-24 NaN Outdoor A North
4 2009-04-23 NaN Outdoor A East
direction_2 latitude longitude location
0 West -37.813807 144.965167 -37.81380668, 144.96516718
1 South -37.814880 144.966088 -37.81487988, 144.9660878
2 South -37.819117 144.965583 -37.81911705, 144.96558255
3 South -37.822935 144.947175 -37.82293543, 144.9471751
4 West -37.818765 144.947105 -37.81876474, 144.94710545
- Cleaning and Validating Sensor Locations Dataset¶
This section checks the sensor locations dataset for missing values and addresses any issues by dropping rows with missing longitude or latitude. It also ensures these columns are converted to numeric types for geospatial analysis. Finally, a sample of the cleaned data is displayed to confirm that the dataset is ready for further use.
# Check for missing values and data types
print("\nMissing values in Sensor Locations Dataset:")
print(sensor_locations.isnull().sum())
# Drop rows with missing longitude/latitude
sensor_locations = sensor_locations.dropna(subset=['longitude', 'latitude'])
# Ensure longitude and latitude are numeric
sensor_locations['longitude'] = pd.to_numeric(sensor_locations['longitude'], errors='coerce')
sensor_locations['latitude'] = pd.to_numeric(sensor_locations['latitude'], errors='coerce')
# Verify cleaned data
print("\nCleaned Sensor Locations Dataset Sample:")
print(sensor_locations.head())
Missing values in Sensor Locations Dataset: location_id 0 sensor_description 2 sensor_name 0 installation_date 2 note 108 location_type 0 status 0 direction_1 32 direction_2 32 latitude 0 longitude 0 location 0 dtype: int64 Cleaned Sensor Locations Dataset Sample: location_id sensor_description sensor_name \ 0 2 Bourke Street Mall (South) Bou283_T 1 4 Town Hall (West) Swa123_T 2 6 Flinders Street Station Underpass FliS_T 3 8 Webb Bridge WebBN_T 4 10 Victoria Point BouHbr_T installation_date note location_type status direction_1 \ 0 2009-03-30 NaN Outdoor A East 1 2009-03-23 NaN Outdoor A North 2 2009-03-25 Upgraded on 8/09/21 Outdoor A North 3 2009-03-24 NaN Outdoor A North 4 2009-04-23 NaN Outdoor A East direction_2 latitude longitude location 0 West -37.813807 144.965167 -37.81380668, 144.96516718 1 South -37.814880 144.966088 -37.81487988, 144.9660878 2 South -37.819117 144.965583 -37.81911705, 144.96558255 3 South -37.822935 144.947175 -37.82293543, 144.9471751 4 West -37.818765 144.947105 -37.81876474, 144.94710545
- Merging Pedestrian Traffic Data with Sensor Locations¶
This section merges the pedestrian traffic data with the sensor locations dataset using location_id as the common key. A left join is performed to ensure all rows from the pedestrian traffic data are retained, while adding the corresponding longitude and latitude values from the sensor locations. A sample of the merged dataset is displayed to confirm the successful integration of geospatial information.
# Merge pedestrian traffic data with sensor locations
pedestrian_traffic_data = pedestrian_traffic_data.merge(
sensor_locations[['location_id', 'longitude', 'latitude']],
on='location_id', # Use 'location_id' as the common key
how='left' # Left join to retain all rows from pedestrian_traffic_data
)
# Verify the merged dataset
print("\nPedestrian Traffic Dataset with Geospatial Data:")
print(pedestrian_traffic_data.head())
Pedestrian Traffic Dataset with Geospatial Data:
id location_id sensing_date hourday direction_1 direction_2 \
0 522020231213 52 2023-12-13 20 368 176
1 231820230711 23 2023-07-11 18 158 310
2 72220230817 72 2023-08-17 2 2 4
3 581120241106 58 2024-11-06 11 857 282
4 23620220731 23 2022-07-31 6 8 27
pedestriancount sensor_name location longitude \
0 544 Eli263_T -37.81252157, 144.9619401 144.961940
1 468 Col623_T -37.81909256, 144.95452748 144.954527
2 6 ACMI_T -37.81726338, 144.96872809 144.968728
3 1139 Bou688_T -37.81686075, 144.95358075 144.953581
4 35 Col623_T -37.81909256, 144.95452748 144.954527
latitude
0 -37.812522
1 -37.819093
2 -37.817263
3 -37.816861
4 -37.819093
- Checking for Missing Geospatial Data After Merge¶
This section checks for any rows in the pedestrian traffic dataset that are missing geospatial data (longitude and latitude) after the merge with the sensor locations. The count of such rows is displayed to identify any remaining gaps in geospatial information that may need to be addressed.
# Check for missing longitude and latitude after merge
missing_geo = pedestrian_traffic_data[pedestrian_traffic_data['longitude'].isnull()]
print(f"\nRows with missing geospatial data: {missing_geo.shape[0]}")
Rows with missing geospatial data: 0
- Add Geometric Data for Spatial Analysis¶
This section creates a geometry column for both the pedestrian traffic and business establishments datasets. Each row's longitude and latitude are converted into Point objects using the shapely.geometry.Point class. These geometry columns are essential for geospatial analysis, enabling mapping and spatial operations.
from shapely.geometry import Point
# Create geometry column for pedestrian traffic data
pedestrian_traffic_data['geometry'] = pedestrian_traffic_data.apply(
lambda row: Point(float(row['longitude']), float(row['latitude'])), axis=1
)
# Create geometry column for business establishments data
business_establishments_data['geometry'] = business_establishments_data.apply(
lambda row: Point(float(row['longitude']), float(row['latitude'])), axis=1
)
- Prepare GeoDataFrames¶
This section converts the pedestrian traffic and business establishments datasets into GeoDataFrames using geopandas. The geometry column created earlier is used to define spatial features, and the coordinate reference system (CRS) is set to EPSG:4326 for geographic coordinates (latitude and longitude). The resulting GeoDataFrames are then verified by displaying sample rows to confirm the data structure is suitable for geospatial analysis.
# Convert pedestrian traffic data to a GeoDataFrame
pedestrian_gdf = gpd.GeoDataFrame(pedestrian_traffic_data, geometry='geometry', crs="EPSG:4326")
# Convert business establishments data to a GeoDataFrame
business_gdf = gpd.GeoDataFrame(business_establishments_data, geometry='geometry', crs="EPSG:4326")
# Verify GeoDataFrame structures
print("\nPedestrian GeoDataFrame Sample:")
print(pedestrian_gdf.head())
print("\nBusiness GeoDataFrame Sample:")
print(business_gdf.head())
Pedestrian GeoDataFrame Sample:
id location_id sensing_date hourday direction_1 direction_2 \
0 522020231213 52 2023-12-13 20 368 176
1 231820230711 23 2023-07-11 18 158 310
2 72220230817 72 2023-08-17 2 2 4
3 581120241106 58 2024-11-06 11 857 282
4 23620220731 23 2022-07-31 6 8 27
pedestriancount sensor_name location longitude \
0 544 Eli263_T -37.81252157, 144.9619401 144.961940
1 468 Col623_T -37.81909256, 144.95452748 144.954527
2 6 ACMI_T -37.81726338, 144.96872809 144.968728
3 1139 Bou688_T -37.81686075, 144.95358075 144.953581
4 35 Col623_T -37.81909256, 144.95452748 144.954527
latitude geometry
0 -37.812522 POINT (144.96194 -37.81252)
1 -37.819093 POINT (144.95453 -37.81909)
2 -37.817263 POINT (144.96873 -37.81726)
3 -37.816861 POINT (144.95358 -37.81686)
4 -37.819093 POINT (144.95453 -37.81909)
Business GeoDataFrame Sample:
census_year block_id property_id base_property_id clue_small_area \
0 2021 27 103596 103596 Melbourne (CBD)
1 2021 27 103968 103968 Melbourne (CBD)
2 2021 27 103968 103968 Melbourne (CBD)
3 2021 27 103968 103968 Melbourne (CBD)
4 2021 27 103968 103968 Melbourne (CBD)
trading_name \
0 TMF Corporate Services (Aust) Pty Limited
1 Arena Reit Limited
2 Taxbanter Pty Ltd
3 Pask Group
4 Webb Martin Consulting Pty Ltd
business_address industry_anzsic4_code \
0 Part Level 9 63 Exhibition Street MELBOURNE VI... 6932
1 Suite 5, Level 5 41 Exhibition Street MELBOURN... 6720
2 Part Suite 9, Level 9 41 Exhibition Street MEL... 6932
3 Part Suite 15, Level 15 41 Exhibition Street M... 3011
4 Part Suite 9, Level 9 41 Exhibition Street MEL... 6932
industry_anzsic4_description longitude latitude \
0 Accounting Services 144.971304 -37.814602
1 Real Estate Services 144.971575 -37.815017
2 Accounting Services 144.971575 -37.815017
3 House Construction 144.971575 -37.815017
4 Accounting Services 144.971575 -37.815017
geometry
0 POINT (144.9713 -37.8146)
1 POINT (144.97158 -37.81502)
2 POINT (144.97158 -37.81502)
3 POINT (144.97158 -37.81502)
4 POINT (144.97158 -37.81502)
- Save Cleaned Data¶
This section saves the cleaned pedestrian traffic and business establishments datasets as CSV files. This step ensures that the processed data is preserved for future use or further analysis. A confirmation message is displayed to indicate successful saving of the datasets.
# Save cleaned pedestrian traffic data to CSV
pedestrian_traffic_data.to_csv('cleaned_pedestrian_traffic_data.csv', index=False)
# Save cleaned business establishments data to CSV
business_establishments_data.to_csv('cleaned_business_establishments_data.csv', index=False)
print("Cleaned datasets saved successfully!")
Cleaned datasets saved successfully!
Data Exploration and Visualization¶
- Visualize Pedestrian Traffic Hotspots¶
This section visualizes pedestrian traffic hotspots in Melbourne by plotting the locations from the GeoDataFrame. The points are displayed in blue on a map with axes labeled for longitude and latitude. The plot provides a geographic overview of areas with recorded pedestrian activity, helping to identify high-traffic regions.
# Plot pedestrian traffic locations
plt.figure(figsize=(10, 8))
pedestrian_gdf.plot(ax=plt.gca(), color='blue', markersize=5, alpha=0.6, legend=True)
plt.title("Pedestrian Traffic Hotspots in Melbourne")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()
- Visualize Business Establishments¶
This section creates a geographic visualization of business establishments in Melbourne by plotting their locations on a map. The points are displayed in green, representing the spatial distribution of businesses. The plot helps in understanding the density and spread of business establishments across the city.
# Plot business locations
plt.figure(figsize=(10, 8))
business_gdf.plot(ax=plt.gca(), color='green', markersize=5, alpha=0.6, legend=True)
plt.title("Business Establishments in Melbourne")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()
Combine Pedestrian and Business Data for Proximity Analysis¶
- Verify Geometry Types¶
This section verifies the types of geometries in the GeoDataFrames for pedestrian traffic and business establishments. It counts and displays the different geometry types (e.g:- Point) to ensure that all spatial data is correctly formatted and suitable for geospatial analysis.
# Check geometry types in pedestrian GeoDataFrame
print(pedestrian_gdf.geom_type.value_counts())
# Check geometry types in business GeoDataFrame
print(business_gdf.geom_type.value_counts())
Point 2191890 Name: count, dtype: int64 Point 374210 Name: count, dtype: int64
- Perform Nearest Neighbor Analysis¶
This section performs a nearest neighbor analysis to link business establishments with nearby pedestrian traffic data. First, longitude and latitude are converted into Point objects, and both datasets are transformed into GeoDataFrames with valid geometries, ensuring any invalid points are removed. The coordinates of pedestrian and business locations are extracted and structured for spatial computation using cKDTree. This tree is then used to calculate the nearest pedestrian location for each business, along with the distance between them. The resulting pedestrian count and distance are added as new columns to the business GeoDataFrame, providing valuable insights into the proximity of businesses to high pedestrian traffic areas. The updated dataset is verified to ensure the analysis was successful.
from scipy.spatial import cKDTree
# Step 1: Re-create Geometry Columns
# Pedestrian Data
pedestrian_traffic_data['geometry'] = pedestrian_traffic_data.apply(
lambda row: Point(row['longitude'], row['latitude']), axis=1
)
# Business Data
business_establishments_data['geometry'] = business_establishments_data.apply(
lambda row: Point(row['longitude'], row['latitude']), axis=1
)
# Convert to GeoDataFrames
pedestrian_gdf = gpd.GeoDataFrame(pedestrian_traffic_data, geometry='geometry', crs="EPSG:4326")
business_gdf = gpd.GeoDataFrame(business_establishments_data, geometry='geometry', crs="EPSG:4326")
# Step 2: Validate and Filter Geometries
# Remove invalid geometries for pedestrian data
pedestrian_gdf = pedestrian_gdf[~pedestrian_gdf.geometry.is_empty]
pedestrian_gdf = pedestrian_gdf[~pedestrian_gdf.geometry.isnull()]
# Remove invalid geometries for business data
business_gdf = business_gdf[~business_gdf.geometry.is_empty]
business_gdf = business_gdf[~business_gdf.geometry.isnull()]
# Step 3: Extract Coordinates for KDTree
pedestrian_coords = np.array(list(zip(pedestrian_gdf.geometry.x, pedestrian_gdf.geometry.y)))
business_coords = np.array(list(zip(business_gdf.geometry.x, business_gdf.geometry.y)))
# Ensure there are valid coordinates
if pedestrian_coords.shape[0] == 0:
raise ValueError("No valid pedestrian coordinates found. Check the pedestrian GeoDataFrame.")
if business_coords.shape[0] == 0:
raise ValueError("No valid business coordinates found. Check the business GeoDataFrame.")
# Step 4: Perform Nearest Neighbor Analysis
tree = cKDTree(pedestrian_coords)
distances, indices = tree.query(business_coords, k=1)
# Step 5: Add Nearest Pedestrian Data to Business GeoDataFrame
business_gdf['nearest_pedestrian_count'] = pedestrian_gdf.iloc[indices]['pedestriancount'].values
business_gdf['distance_to_nearest_pedestrian'] = distances
# Step 6: Verify the Result
print("\nBusiness GeoDataFrame with Nearest Pedestrian Data:")
print(business_gdf[['trading_name', 'nearest_pedestrian_count', 'distance_to_nearest_pedestrian']])
Business GeoDataFrame with Nearest Pedestrian Data:
trading_name nearest_pedestrian_count \
0 TMF Corporate Services (Aust) Pty Limited 50
1 Arena Reit Limited 241
2 Taxbanter Pty Ltd 241
3 Pask Group 241
4 Webb Martin Consulting Pty Ltd 241
... ... ...
374205 Vacant 108
374206 Aeon Accessories 108
374207 Swim Communications 108
374208 vacant 108
374209 Frank Samways Veterinary Clinic 108
distance_to_nearest_pedestrian
0 0.001569
1 0.001440
2 0.001440
3 0.001440
4 0.001440
... ...
374205 0.006658
374206 0.006955
374207 0.006708
374208 0.007151
374209 0.008125
[374210 rows x 3 columns]
Identify High-Traffic Business Zones¶
This section applies DBSCAN clustering to identify high-traffic business zones in Melbourne. A subset of pedestrian and business coordinates is sampled to optimize computational efficiency, and the combined data is clustered using the DBSCAN algorithm. DBSCAN groups points based on density, identifying clusters and separating noise. The clustering results are added to the sampled business GeoDataFrame, assigning a cluster label to each business. Finally, the clusters are visualized on a map, highlighting high-traffic business zones and providing insights into their spatial distribution.
from sklearn.cluster import DBSCAN
# Sample size for clustering
sample_size = 10000
# Sample pedestrian and business coordinates
pedestrian_sample = pedestrian_coords[
np.random.choice(pedestrian_coords.shape[0], min(sample_size, pedestrian_coords.shape[0]), replace=False)
]
business_sample = business_coords[
np.random.choice(business_coords.shape[0], min(sample_size, business_coords.shape[0]), replace=False)
]
# Combine sampled data
combined_sample_coords = np.concatenate([pedestrian_sample, business_sample], axis=0)
# Perform DBSCAN clustering
db = DBSCAN(eps=0.005, min_samples=5).fit(combined_sample_coords)
# Create a DataFrame for the sampled business data with clustering results
business_sample_gdf = business_gdf.iloc[:len(business_sample)].copy()
business_sample_gdf['cluster'] = db.labels_[-len(business_sample):]
# Plot clusters for the sampled business data
plt.figure(figsize=(10, 8))
business_sample_gdf.plot(column='cluster', cmap='viridis', legend=True, markersize=5, alpha=0.6)
plt.title("High-Traffic Business Zones in Melbourne")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()
<Figure size 1000x800 with 0 Axes>
Identify High-Potential Retail Zones¶
This section identifies and visualizes high-potential retail zones in Melbourne based on pedestrian traffic and business density. It begins by ensuring valid geometries in the GeoDataFrames, then samples business data for clustering using the DBSCAN algorithm to group businesses into clusters. Each cluster is analyzed for its business density, which is added to the GeoDataFrame. Thresholds are set based on the 75th percentile for low business density and the 25th percentile for high pedestrian counts to identify zones with low competition and high foot traffic. These high-potential zones are verified and visualized on a map, highlighting areas suitable for new retail opportunities. The plot combines pedestrian traffic data with marked high-potential zones, offering a clear geographic representation of the findings.
# Ensure valid geometry in the GeoDataFrame
business_gdf = business_gdf[~business_gdf.geometry.is_empty]
business_gdf = business_gdf[~business_gdf.geometry.isna()]
pedestrian_gdf = pedestrian_gdf[~pedestrian_gdf.geometry.is_empty]
pedestrian_gdf = pedestrian_gdf[~pedestrian_gdf.geometry.isna()]
# Step 1: Sample business data
sample_size = 10000
business_sample = business_gdf.sample(n=min(sample_size, len(business_gdf)), random_state=42)
# Extract coordinates for clustering from the sample
business_coords_sample = np.array(list(zip(business_sample.geometry.x, business_sample.geometry.y)))
# Ensure there are valid coordinates
if len(business_coords_sample) == 0:
raise ValueError("No valid business coordinates found in the sample. Check the business GeoDataFrame.")
# Step 2: Perform DBSCAN clustering on the sampled data
db = DBSCAN(eps=0.005, min_samples=5).fit(business_coords_sample)
# Step 3: Assign cluster labels to the sampled data
business_sample['cluster'] = db.labels_
# Step 4: Map sampled clusters back to the full dataset
cluster_mapping = dict(zip(business_sample.index, business_sample['cluster']))
business_gdf['cluster'] = business_gdf.index.map(cluster_mapping).fillna(-1).astype(int)
# Verify the cluster column
print("Unique clusters assigned:", business_gdf['cluster'].unique())
# Step 5: Calculate business density for each cluster
business_density = business_gdf.groupby('cluster').size()
# Step 6: Add business density information to the GeoDataFrame
business_gdf['business_density'] = business_gdf['cluster'].map(business_density)
# Verify the addition of business density
print("\nBusiness Density by Cluster:")
print(business_density.head())
# Step 7: Inspect Data Distributions
print("\nBusiness Density Summary:")
print(business_density.describe())
print("\nPedestrian Count Summary:")
print(pedestrian_gdf['pedestriancount'].describe())
# Step 8: Set thresholds for high-potential zones
threshold_density = business_density.quantile(0.75) # 75th percentile for broader inclusion
pedestrian_threshold = pedestrian_gdf['pedestriancount'].quantile(0.25) # 25th percentile for broader inclusion
# Verify thresholds
print(f"\nAdjusted Threshold for Low Business Density: {threshold_density}")
print(f"Adjusted Threshold for High Pedestrian Count: {pedestrian_threshold}")
# Step 9: Identify high-potential retail zones
high_potential_zones = business_gdf[
(business_gdf['nearest_pedestrian_count'] > pedestrian_threshold) &
(business_gdf['business_density'] < threshold_density)
]
# Debugging step: Display rows that match each condition
print("\nRows matching high pedestrian count condition:")
print(business_gdf[business_gdf['nearest_pedestrian_count'] > pedestrian_threshold])
print("\nRows matching low business density condition:")
print(business_gdf[business_gdf['business_density'] < threshold_density])
# Verify high-potential zones
if high_potential_zones.empty:
print("\nNo high-potential retail zones found. Consider further relaxing thresholds.")
else:
print("\nHigh-Potential Retail Zones:")
print(high_potential_zones[['trading_name', 'nearest_pedestrian_count', 'business_density']])
# Step 10: Visualize high-potential retail zones
plt.figure(figsize=(10, 8))
# Plot pedestrian traffic as base
base = pedestrian_gdf.plot(color='blue', markersize=5, alpha=0.6, label='Pedestrian Traffic')
if not high_potential_zones.empty:
# Plot high-potential retail zones
high_potential_zones.plot(ax=base, color='red', markersize=10, alpha=0.8, label='High-Potential Zones')
plt.title("High-Potential Retail Zones in Melbourne")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
# Set aspect ratio to avoid errors
plt.gca().set_aspect('equal', adjustable='datalim')
plt.legend()
plt.show()
Unique clusters assigned: [-1 0 1 2]
Business Density by Cluster:
cluster
-1 364216
0 9966
1 23
2 5
dtype: int64
Business Density Summary:
count 4.000000
mean 93552.500000
std 180503.310534
min 5.000000
25% 18.500000
50% 4994.500000
75% 98528.500000
max 364216.000000
dtype: float64
Pedestrian Count Summary:
count 2.191890e+06
mean 3.658121e+02
std 5.514674e+02
min 0.000000e+00
25% 3.900000e+01
50% 1.530000e+02
75% 4.360000e+02
max 8.895000e+03
Name: pedestriancount, dtype: float64
Adjusted Threshold for Low Business Density: 98528.5
Adjusted Threshold for High Pedestrian Count: 39.0
Rows matching high pedestrian count condition:
census_year block_id property_id base_property_id clue_small_area \
0 2021 27 103596 103596 Melbourne (CBD)
1 2021 27 103968 103968 Melbourne (CBD)
2 2021 27 103968 103968 Melbourne (CBD)
3 2021 27 103968 103968 Melbourne (CBD)
4 2021 27 103968 103968 Melbourne (CBD)
... ... ... ... ... ...
374205 2020 309 593976 593976 North Melbourne
374206 2020 310 628737 628737 North Melbourne
374207 2020 310 628739 628739 North Melbourne
374208 2020 310 628743 628743 North Melbourne
374209 2020 311 101102 101102 North Melbourne
trading_name \
0 TMF Corporate Services (Aust) Pty Limited
1 Arena Reit Limited
2 Taxbanter Pty Ltd
3 Pask Group
4 Webb Martin Consulting Pty Ltd
... ...
374205 Vacant
374206 Aeon Accessories
374207 Swim Communications
374208 vacant
374209 Frank Samways Veterinary Clinic
business_address \
0 Part Level 9 63 Exhibition Street MELBOURNE VI...
1 Suite 5, Level 5 41 Exhibition Street MELBOURN...
2 Part Suite 9, Level 9 41 Exhibition Street MEL...
3 Part Suite 15, Level 15 41 Exhibition Street M...
4 Part Suite 9, Level 9 41 Exhibition Street MEL...
... ...
374205 138-140 Langford Street NORTH MELBOURNE VIC 3051
374206 49-53 Steel Street NORTH MELBOURNE VIC 3051
374207 126-134 Langford Street NORTH MELBOURNE VIC 3051
374208 42-48 Straker Street NORTH MELBOURNE VIC 3051
374209 1-3 Boundary Road NORTH MELBOURNE VIC 3051
industry_anzsic4_code industry_anzsic4_description \
0 6932 Accounting Services
1 6720 Real Estate Services
2 6932 Accounting Services
3 3011 House Construction
4 6932 Accounting Services
... ... ...
374205 0 Vacant Space
374206 3493 Telecommunication Goods Wholesaling
374207 7000 Computer System Design and Related Services
374208 0 Vacant Space
374209 6970 Veterinary Services
longitude latitude geometry \
0 144.971304 -37.814602 POINT (144.9713 -37.8146)
1 144.971575 -37.815017 POINT (144.97158 -37.81502)
2 144.971575 -37.815017 POINT (144.97158 -37.81502)
3 144.971575 -37.815017 POINT (144.97158 -37.81502)
4 144.971575 -37.815017 POINT (144.97158 -37.81502)
... ... ... ...
374205 144.936891 -37.795840 POINT (144.93689 -37.79584)
374206 144.937127 -37.796156 POINT (144.93713 -37.79616)
374207 144.936880 -37.796126 POINT (144.93688 -37.79613)
374208 144.937226 -37.796541 POINT (144.93723 -37.79654)
374209 144.938101 -37.797013 POINT (144.9381 -37.79701)
nearest_pedestrian_count distance_to_nearest_pedestrian cluster \
0 50 0.001569 -1
1 241 0.001440 -1
2 241 0.001440 -1
3 241 0.001440 -1
4 241 0.001440 -1
... ... ... ...
374205 108 0.006658 -1
374206 108 0.006955 -1
374207 108 0.006708 -1
374208 108 0.007151 -1
374209 108 0.008125 -1
business_density
0 364216
1 364216
2 364216
3 364216
4 364216
... ...
374205 364216
374206 364216
374207 364216
374208 364216
374209 364216
[261380 rows x 16 columns]
Rows matching low business density condition:
census_year block_id property_id base_property_id clue_small_area \
87 2021 31 102111 102111 Melbourne (CBD)
117 2021 31 105945 105945 Melbourne (CBD)
123 2021 31 108968 108968 Melbourne (CBD)
180 2021 32 102119 102119 Melbourne (CBD)
264 2021 33 105937 105937 Melbourne (CBD)
... ... ... ... ... ...
373861 2021 24 110762 110762 Melbourne (CBD)
373876 2021 24 110762 110762 Melbourne (CBD)
374007 2021 27 102067 102067 Melbourne (CBD)
374114 2020 266 109849 109849 Carlton
374157 2020 270 664627 104468 Parkville
trading_name \
87 Vacant
117 Team Building Construction Pty Ltd
123 Apna Desi Indian Restaurant
180 A2M Consulting Pty Ltd
264 PP&E Valuations Pty Ltd
... ...
373861 Hearing Australia
373876 Vacant
374007 Bell Asset Management Limited
374114 RMIT Building 51 - Frederick Campbell
374157 National Australia Bank Limited
business_address \
87 Shop 1-3, 608-610 Collins Street MELBOURNE VIC...
117 Unit 5, Ground 601 Little Collins Street MELBO...
123 Shop 5, 120 Spencer Street MELBOURNE VIC 3000
180 Suite 2, Level 15 470 Collins Street MELBOURNE...
264 Suite 504-505, Level 5 443 Little Collins Stre...
... ...
373861 Part Level 5 303 Collins Street MELBOURNE VIC ...
373876 Level 9 303 Collins Street MELBOURNE VIC 3000
374007 Part Level 20 101 Collins Street MELBOURNE VIC...
374114 80-92 Victoria Street CARLTON VIC 3053
374157 Part Ground NAB Bank Building 143 230 Grattan ...
industry_anzsic4_code \
87 0
117 3019
123 4511
180 6922
264 6720
... ...
373861 8512
373876 0
374007 6419
374114 8102
374157 6221
industry_anzsic4_description longitude \
87 Vacant Space 144.954568
117 Other Residential Building Construction 144.954904
123 Cafes and Restaurants 144.954365
180 Surveying and Mapping Services 144.958334
264 Real Estate Services 144.960097
... ... ...
373861 Specialist Medical Services 144.963824
373876 Vacant Space 144.963824
374007 Other Auxiliary Finance and Investment Services 144.970773
374114 Higher Education 144.964848
374157 Banking 144.961209
latitude geometry nearest_pedestrian_count \
87 -37.818610 POINT (144.95457 -37.81861) 601
117 -37.818028 POINT (144.9549 -37.81803) 601
123 -37.818184 POINT (144.95436 -37.81818) 601
180 -37.817538 POINT (144.95833 -37.81754) 305
264 -37.816396 POINT (144.9601 -37.8164) 57
... ... ... ...
373861 -37.816628 POINT (144.96382 -37.81663) 149
373876 -37.816628 POINT (144.96382 -37.81663) 149
374007 -37.814893 POINT (144.97077 -37.81489) 50
374114 -37.806564 POINT (144.96485 -37.80656) 596
374157 -37.796772 POINT (144.96121 -37.79677) 3
distance_to_nearest_pedestrian cluster business_density
87 0.000280 0 9966
117 0.000946 0 9966
123 0.000707 0 9966
180 0.002435 0 9966
264 0.001121 0 9966
... ... ... ...
373861 0.001785 0 9966
373876 0.001785 0 9966
374007 0.001366 0 9966
374114 0.002078 0 9966
374157 0.003212 0 9966
[9994 rows x 16 columns]
High-Potential Retail Zones:
trading_name nearest_pedestrian_count \
87 Vacant 601
117 Team Building Construction Pty Ltd 601
123 Apna Desi Indian Restaurant 601
180 A2M Consulting Pty Ltd 305
264 PP&E Valuations Pty Ltd 57
... ... ...
373836 ACI Worldwide (Pacific) Pty Ltd 57
373861 Hearing Australia 149
373876 Vacant 149
374007 Bell Asset Management Limited 50
374114 RMIT Building 51 - Frederick Campbell 596
business_density
87 9966
117 9966
123 9966
180 9966
264 9966
... ...
373836 9966
373861 9966
373876 9966
374007 9966
374114 9966
[6971 rows x 3 columns]
<Figure size 1000x800 with 0 Axes>
Interactive Mapping¶
This section creates an interactive map centered on Melbourne using Folium to visualize high-traffic and high-potential retail zones. Business clusters are represented with blue markers, while high-potential zones, identified based on pedestrian traffic and business density, are highlighted with red markers. This interactive map provides a dynamic way to explore and analyze the spatial distribution of business opportunities in Melbourne.
# Initialize map centered on Melbourne
m = folium.Map(location=[-37.8136, 144.9631], zoom_start=13)
# Add high-traffic zones
for _, row in business_gdf[business_gdf['cluster'] != -1].iterrows():
folium.CircleMarker(
location=[row.geometry.y, row.geometry.x],
radius=5,
color='blue',
fill=True,
fill_color='blue',
fill_opacity=0.6
).add_to(m)
# Add high-potential zones
for _, row in high_potential_zones.iterrows():
folium.CircleMarker(
location=[row.geometry.y, row.geometry.x],
radius=7,
color='red',
fill=True,
fill_color='red',
fill_opacity=0.8
).add_to(m)
# Display the map
m
Statistical Analysis¶
This section generates summary statistics for business clusters, calculating metrics such as average business density, average pedestrian count, and the number of businesses in each cluster. It aggregates data by clusters, providing insights into cluster characteristics, and saves the summary to a CSV file for further analysis or reporting. This allows for a more comprehensive understanding of business cluster dynamics.
# Summary statistics for clusters
cluster_summary = business_gdf.groupby('cluster').agg({
'business_density': 'mean',
'nearest_pedestrian_count': ['mean', 'max', 'min'],
'longitude': 'count' # Number of businesses per cluster
}).reset_index()
cluster_summary.columns = ['Cluster', 'Avg_Business_Density', 'Avg_Pedestrian_Count', 'Max_Pedestrian_Count', 'Min_Pedestrian_Count', 'Num_Businesses']
print("\nCluster Summary Statistics:")
print(cluster_summary)
# Save cluster summary to a CSV file for further analysis
cluster_summary.to_csv('cluster_summary_statistics.csv', index=False)
Cluster Summary Statistics: Cluster Avg_Business_Density Avg_Pedestrian_Count Max_Pedestrian_Count \ 0 -1 364216.0 296.478958 2613 1 0 9966.0 292.591712 2613 2 1 23.0 148.000000 148 3 2 5.0 47.800000 115 Min_Pedestrian_Count Num_Businesses 0 1 364216 1 1 9966 2 148 23 3 3 5
Creating an Enhanced Interactive Map of Business Clusters Using Plotly Express¶
This section creates an enhanced interactive map using Plotly Express to visualize business clusters in Melbourne. A random sample of businesses is selected for visualization, ensuring valid geometries. The map uses scatter_mapbox to plot businesses, with clusters differentiated by color and business density represented by marker size. Interactive features, such as hover tooltips displaying cluster details and business information, allow for dynamic exploration of the data. The map is styled using carto-positron and includes adjustments to marker opacity and scaling for better clarity. The final interactive map provides an insightful visualization of business cluster characteristics and their spatial distribution.
import plotly.express as px
# Sample a subset of business_gdf for visualization
sample_size = 5000
business_sample = business_gdf.sample(n=min(sample_size, len(business_gdf)), random_state=42)
# Ensure valid geometries in the sample
business_sample = business_sample[~business_sample.geometry.is_empty]
business_sample = business_sample[~business_sample.geometry.isna()]
# Create an interactive map with enhanced features
fig = px.scatter_mapbox(
business_sample,
lat=business_sample.geometry.y,
lon=business_sample.geometry.x,
color='cluster',
size='business_density',
hover_name='trading_name',
hover_data={
'business_density': True,
'nearest_pedestrian_count': True,
'cluster': True,
},
title="Enhanced Interactive Map of Business Clusters",
mapbox_style="carto-positron",
zoom=13,
height=800
)
# Adjust marker opacity and size scaling
fig.update_traces(marker=dict(opacity=0.8, sizemode='area', sizeref=2. * max(business_sample['business_density']) / (40.**2), sizemin=4))
# Add legend title and formatting
fig.update_layout(
legend_title_text="Cluster ID",
margin={"r": 0, "t": 40, "l": 0, "b": 0}
)
# Show the interactive map
fig.show()
Actionable Insights for High-Potential Retail Zones¶
This section analyzes high-potential retail zones to provide actionable insights. For each identified zone, key attributes like trading name, nearest pedestrian count, and business density are summarized, and a recommendation is generated based on high foot traffic and low competition. If no high-potential zones are identified, a message is displayed to indicate the absence of such zones. This step provides valuable strategic guidance for selecting optimal retail locations.
# Generate actionable insights for high-potential zones and display them
if not high_potential_zones.empty:
insights = high_potential_zones[['trading_name', 'nearest_pedestrian_count', 'business_density']].copy()
insights['recommendation'] = insights.apply(
lambda row: "High potential for new retail opportunities based on high pedestrian count and low competition.",
axis=1
)
# Display the actionable insights
print("\nActionable Insights for High-Potential Zones:")
print(insights)
else:
print("No high-potential zones found to generate insights.")
Actionable Insights for High-Potential Zones:
trading_name nearest_pedestrian_count \
87 Vacant 601
117 Team Building Construction Pty Ltd 601
123 Apna Desi Indian Restaurant 601
180 A2M Consulting Pty Ltd 305
264 PP&E Valuations Pty Ltd 57
... ... ...
373836 ACI Worldwide (Pacific) Pty Ltd 57
373861 Hearing Australia 149
373876 Vacant 149
374007 Bell Asset Management Limited 50
374114 RMIT Building 51 - Frederick Campbell 596
business_density recommendation
87 9966 High potential for new retail opportunities ba...
117 9966 High potential for new retail opportunities ba...
123 9966 High potential for new retail opportunities ba...
180 9966 High potential for new retail opportunities ba...
264 9966 High potential for new retail opportunities ba...
... ... ...
373836 9966 High potential for new retail opportunities ba...
373861 9966 High potential for new retail opportunities ba...
373876 9966 High potential for new retail opportunities ba...
374007 9966 High potential for new retail opportunities ba...
374114 9966 High potential for new retail opportunities ba...
[6971 rows x 4 columns]
Interactive Map for Business Clusters and High-Potential Zones in Melbourne¶
This section enhances the interactive map by visualizing clustered businesses, high-potential zones, and dynamic cluster groups with a comprehensive legend. The map is centered on Melbourne, with both clustered businesses and high-potential zones represented by markers of the same size for consistent visualization. Blue markers for clustered businesses and red markers for high-potential zones. Dynamic clustering is implemented using the Folium MarkerCluster plugin, which groups nearby markers into green and yellow circles to reduce clutter and improve readability. Tooltips provide detailed information for each marker, including trading names, cluster IDs, pedestrian counts, and business densities. A custom HTML legend explains the color coding for clustered businesses, high-potential zones, and dynamically generated cluster groups, ensuring clarity for users. This interactive map offers a clear and intuitive representation of Melbourne's business clusters and high-potential retail zones, helping to identify areas of interest effectively.
from folium.plugins import MarkerCluster
# Initialize the map centered on Melbourne
melbourne_map = folium.Map(location=[-37.8136, 144.9631], zoom_start=13)
# Add clustered markers for all businesses
marker_cluster = MarkerCluster().add_to(melbourne_map)
# Check for businesses in clusters
clustered_businesses = business_gdf[business_gdf['cluster'] != -1]
print(f"Number of clustered businesses: {clustered_businesses.shape[0]}")
# Add blue markers for clustered businesses
for _, row in clustered_businesses.iterrows():
folium.CircleMarker(
location=[row.geometry.y, row.geometry.x],
radius=5, # Same size for blue markers
color='blue',
fill=True,
fill_color='blue',
fill_opacity=0.6,
tooltip=f"Trading Name: {row['trading_name']}<br>Cluster: {row['cluster']}<br>Business Density: {row['business_density']}"
).add_to(marker_cluster)
# Highlight high-potential zones
for _, row in high_potential_zones.iterrows():
folium.CircleMarker(
location=[row.geometry.y, row.geometry.x],
radius=5, # Same size for red markers
color='red',
fill=True,
fill_color='red',
fill_opacity=0.8,
tooltip=f"High-Potential Zone:<br>Trading Name: {row['trading_name']}<br>Pedestrian Count: {row['nearest_pedestrian_count']}<br>Business Density: {row['business_density']}"
).add_to(melbourne_map)
# Add a legend (customized with HTML)
legend_html = """
<div style="position: fixed;
bottom: 50px; left: 50px; width: 350px; height: 120px;
background-color: white; border:2px solid grey; z-index:1000; font-size:14px;">
<b>Legend</b><br>
<i style="background:blue; color:white; padding:5px;"> </i> Clustered Businesses<br>
<i style="background:red; color:white; padding:5px;"> </i> High-Potential Zones<br>
<i style="background:green; color:white; padding:5px;"> </i> Small Cluster Group<br>
<i style="background:yellow; color:black; padding:5px;"> </i> Large Cluster Group<br>
</div>
"""
melbourne_map.get_root().html.add_child(folium.Element(legend_html))
# Display the map
melbourne_map
Number of clustered businesses: 9994